29 research outputs found

    Similarity-based and Iterative Label Noise Filters for Monotonic Classification

    Get PDF
    Monotonic ordinal classification has received an increasing interest in the latest years. Building monotone models from these problems usually requires datasets that verify monotonic relationships among the samples. When the monotonic relationships are not met, changing the labels may be a viable option, but the risk is high: wrong label changes would completely change the information contained in the data. In this work, we tackle the construction of monotone datasets by removing the wrong or noisy examples that violate monotonicity restrictions. We propose two monotonic noise filtering algorithms to preprocess the ordinal datasets and improve the monotonic relations between instances. The experiments are carried out over eleven ordinal datasets, showing that the application of the proposed filters improve the prediction capabilities over different levels of noise

    From Big data to Smart Data with the K-Nearest Neighbours algorithm

    Get PDF
    The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique becomes ineffective and inefficient. Due to its drawbacks to tackle large amounts of imperfect data, plenty of research has aimed at improving this algorithm by means of data preprocessing techniques. These weaknesses have turned out as strengths and the k-nearest neighbours rule has become a core model to actually detect and correct imperfect data, eliminating noisy and redundant data, as well as correcting missing values. In this work, we delve into the role of the k nearest neighbour algorithm to come up with smart data from big datasets. We analyse how this model is affected by the big data problem, but at the same time, how it can be used to transform raw data into useful data. Concretely, we discuss the benefits of recent big data technologies (Hadoop and Spark) to enable this model to address large amounts of data, as well as the usefulness of prototype reduction and missing values imputation techniques based on it. As a result, guidelines on the use of the k-nearest neighbour to obtain Smart data are provided and new potential research trends are drawn

    Mecanismos endógenos de privatización encubierta en la escuela pública. Políticas educativas de gestión de resultados y rendición de cuentas en Andalucía

    Get PDF
    En este artículo, analizamos políticas educativas como construcción de conocimiento y de poder,acciones que representan el control institucional. Las pruebas de evaluación estandarizadas y laspolíticas de rendición de cuentas con incentivos económicos en los docentes, son dos tecnologías decontrol de privatización encubierta en la educación pública que están transformando las prácticaseducativas. Abordaremos el análisis de los cambios de estas políticas educativas en la escuelapública de Andalucía.Esta investigación presenta conclusiones preliminares de nuestra participación en el proyecto deinvestigación “Dinámicas endógenas y exógenas de privatización en y de la educación: el modelode cuasimercado en España” (Plan Nacional I+D+ i REF/EDU 2010/ 20853)

    The NoiseFiltersR Package: Label Noise Preprocessing in R

    Get PDF
    In Data Mining, the value of extracted knowledge is directly related to the quality of the used data. This makes data preprocessing one of the most important steps in the knowledge discovery process. A common problem affecting data quality is the presence of noise. A training set with label noise can reduce the predictive performance of classification learning techniques and increase the overfitting of classification models. In this work we present the NoiseFiltersR package. It contains the first extensive R implementation of classical and state-of-the-art label noise filters, which are the most common techniques for preprocessing label noise. The algorithms used for the implementation of the label noise filters are appropriately documented and referenced. They can be called in a R-user-friendly manner, and their results are unified by means of the "filter" class, which also benefits from adapted print and summary methods.Spanish Research ProjectAndalusian Research PlanBrazilian grant-CeMEAI-FAPESPFAPESPUniv Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, SpainUniv Sao Paulo, Inst Ciencias Matemat & Comp, Trabalhador Sao Carlense Av 400, BR-13560970 Sao Carlos, SP, BrazilUniv Fed Sao Paulo, Inst Ciencia & Tecnol, Talim St 330, BR-12231280 Sao Jose Dos Campos, SP, BrazilUniv Fed Sao Paulo, Inst Ciencia & Tecnol, Talim St 330, BR-12231280 Sao Jose Dos Campos, SP, BrazilSpanish Research Project: TIN2014-57251-PAndalusian Research Plan: P11-TIC-7765CeMEAI-FAPESP: 2013/07375-0FAPESP: 2012/22608-8FAPESP: 2011/14602-7Web of Scienc

    KEEL 3.0: an open source software for multi-stage analysis in data mining

    Get PDF
    This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to performdata management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithms’ results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems

    Preclinical Activity of Eltrombopag (SB-497115), an Oral, Nonpeptide Thrombopoietin Receptor Agonist

    Get PDF
    Eltrombopag is a first-in-class, orally bioavailable, small-molecule, nonpeptide agonist of the thrombopoietin receptor (TpoR), which is being developed as a treatment for thrombocytopenia of various etiologies. In vitro studies have demonstrated that the activity of eltrombopag is dependent on expression of TpoR, which activates the signaling transducers and activators of transcription (STAT) and mitogen-activated protein kinase signal transduction pathways. The objective of this preclinical study is to determine if eltrombopag interacts selectively with the TpoR to facilitate megakaryocyte differentiation in platelets. Functional thrombopoietic activity was demonstrated by the proliferation and differentiation of primary human CD34+ bone marrow cells into CD41+ megakaryocytes. Measurements in platelets in several species indicated that eltrombopag specifically activates only the human and chimpanzee STAT pathways. The in vivo activity of eltrombopag was demonstrated by an increase of up to 100% in platelet numbers when administered orally (10 mg/kg per day for 5 days) to chimpanzees. In conclusion, eltrombopag interacts selectively with the TpoR without competing with Tpo, leading to the increased proliferation and differentiation of human bone marrow progenitor cells into megakaryocytes and increased platelet production. These results suggest that eltrombopag and Tpo may be able to act additively to increase platelet production

    Brocal de pozo

    Get PDF
    Peer reviewe

    From Big data to Smart Data with the K-Nearest Neighbours algorithm

    No full text
    The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique becomes ineffective and inefficient. Due to its drawbacks to tackle large amounts of imperfect data, plenty of research has aimed at improving this algorithm by means of data preprocessing techniques. These weaknesses have turned out as strengths and the k-nearest neighbours rule has become a core model to actually detect and correct imperfect data, eliminating noisy and redundant data, as well as correcting missing values. In this work, we delve into the role of the k nearest neighbour algorithm to come up with smart data from big datasets. We analyse how this model is affected by the big data problem, but at the same time, how it can be used to transform raw data into useful data. Concretely, we discuss the benefits of recent big data technologies (Hadoop and Spark) to enable this model to address large amounts of data, as well as the usefulness of prototype reduction and missing values imputation techniques based on it. As a result, guidelines on the use of the k-nearest neighbour to obtain Smart data are provided and new potential research trends are drawn

    Exact fuzzy k-Nearest neighbor classification for big datasets

    No full text
    The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same importance to classify. There are several approaches to enhance its precision, with the Fuzzy k Nearest Neighbors (FuzzykNN) classifier being among the most successful ones. FuzzykNN computes a fuzzy degree of membership of each instance to the classes of the problem. As a result, it generates smoother borders between classes. Apart from the existing kNN approach to handle big datasets, there is not a fuzzy variant to manage that volume of data. Nevertheless, calculating this class membership adds an extra computational cost becoming even less scalable to tackle large datasets because of memory needs and high runtime. In this work, we present an exact and distributed approach to run the Fuzzy-kNN classifier on big datasets based on Spark, which provides the same precision than the original algorithm. It presents two separately stages. The first stage transforms the training set adding the class membership degrees. The second stage classifies with the kNN algorithm the test set using the class membership computed previously. In our experiments, we study the scaling-up capabilities of the proposed approach with datasets up to 11 million instances, showing promising results
    corecore